Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid intermediate DAGCircuit construction in 2q synthesis #12179

Merged
merged 1 commit into from
Apr 30, 2024

Conversation

mtreinish
Copy link
Member

Summary

This commit builds on #12109 which added a dag output to the two qubit
decomposers that are then used by unitary synthesis to add a mode of
operation in unitary synthesis that avoids intermediate dag creation.
To do this efficiently this requires changing the UnitarySynthesis pass
to rebuild the DAG instead of doing a node substitution.

Details and comments

This depends-on #12109 and will need to be rebased after that merges

@mtreinish mtreinish added on hold Can not fix yet performance Changelog: None Do not include in changelog labels Apr 14, 2024
@mtreinish mtreinish added this to the 1.1.0 milestone Apr 14, 2024
@qiskit-bot
Copy link
Collaborator

One or more of the the following people are requested to review this:

  • @Qiskit/terra-core
  • @levbishop

@mtreinish
Copy link
Member Author

mtreinish commented Apr 14, 2024

I ran some quick asv benchmarks and it's showing a nice modest performance improvement over #12179

Benchmarks that have improved:

       before           after         ratio
     [99c66b44]       [cdaf174a]
     <dag-2q-synth>       <no-dag-2q-synth>
-         311±2ms          281±4ms     0.90  transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(1)
-         107±1ms       96.7±0.7ms     0.90  transpiler_levels.TranspilerLevelBenchmarks.time_schedule_qv_14_x_14(0)
-      86.0±0.4ms         76.4±1ms     0.89  transpiler_levels.TranspilerLevelBenchmarks.time_schedule_qv_14_x_14(1)
-        79.4±2ms       70.3±0.3ms     0.89  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(0)
-      65.8±0.3ms      57.4±0.09ms     0.87  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(1)

Benchmarks that have stayed the same:

       before           after         ratio
     [99c66b44]       [cdaf174a]
     <dag-2q-synth>       <no-dag-2q-synth>
        225±0.9ms        227±0.4ms     1.01  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(2)
          229±2ms          230±1ms     1.01  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(2)
       54.2±0.2ms       54.4±0.1ms     1.00  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(0)
             2565             2565     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_quantum_volume_transpile_50_x_20(0)
             1403             1403     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_quantum_volume_transpile_50_x_20(1)
             1403             1403     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_quantum_volume_transpile_50_x_20(2)
             1296             1296     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_quantum_volume_transpile_50_x_20(3)
             2705             2705     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm(0)
             2005             2005     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm(1)
             2005             2005     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm(2)
                7                7     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm(3)
             2705             2705     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm_backend_with_prop(0)
             2005             2005     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm_backend_with_prop(1)
             2005             2005     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm_backend_with_prop(2)
                7                7     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_from_large_qasm_backend_with_prop(3)
              323              323     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(0)
              336              336     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(1)
              336              336     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(2)
              272              272     1.00  transpiler_levels.TranspilerLevelBenchmarks.track_depth_transpile_qv_14_x_14(3)
          114±2ms        114±0.2ms     1.00  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(1)
       56.3±0.2ms       56.2±0.3ms     1.00  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(0)
          116±1ms        116±0.4ms     1.00  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(1)
        128±0.6ms          127±2ms     0.99  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm(3)
        129±0.5ms        128±0.2ms     0.99  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_from_large_qasm_backend_with_prop(3)
          1.43±0s       1.40±0.01s     0.97  transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(2)
          273±1ms        266±0.8ms     0.97  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(3)
        244±0.7ms          236±1ms     0.97  transpiler_levels.TranspilerLevelBenchmarks.time_transpile_qv_14_x_14(2)
       1.82±0.04s          1.75±0s     0.96  transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(3)
          815±3ms          784±7ms     0.96  transpiler_levels.TranspilerLevelBenchmarks.time_quantum_volume_transpile_50_x_20(0)

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

@mtreinish
Copy link
Member Author

The other question for me is whether we want to have logic to default to the old mode. I did a quick benchmark comparing the runtime of a 1000 gate 2q circuit (I probably should do a deeper example as this isn't super realistic) where it was either a cx gate or a random unitary and did a sweep over the percentages. It looks like there is an crossover point at slightly less than 2%:

comparison

It wouldn't be very hard to compute the percentage of unitary gates in the circuit and use the old code (without this PR in it's current form) if it's < 0.02 or some other threshold we decide on. That might give us the best of both worlds depending depending on how many unitary gates are in the circuit.

@coveralls
Copy link

coveralls commented Apr 14, 2024

Pull Request Test Coverage Report for Build 8877774475

Details

  • 60 of 60 (100.0%) changed or added relevant lines in 1 file are covered.
  • 17 unchanged lines in 3 files lost coverage.
  • Overall coverage decreased (-0.01%) to 89.431%

Files with Coverage Reduction New Missed Lines %
qiskit/transpiler/passes/synthesis/unitary_synthesis.py 2 88.2%
crates/qasm2/src/lex.rs 3 92.62%
crates/qasm2/src/parse.rs 12 97.15%
Totals Coverage Status
Change from base Build 8870157056: -0.01%
Covered Lines: 60951
Relevant Lines: 68154

💛 - Coveralls

Copy link
Contributor

@ElePT ElePT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM, I just left one small comment in the code, and as mentioned in #12195, I like the idea of using the old code to gain a bit of speed on the cases with fewer unitaries. I think it's a use case that might come up when someone builds a benchmark, but if there's no time before the release deadline I also think it's fine to do it as a follow-up.

getattr(circ, name)(*params, *qubits)
except AttributeError as exc:
if use_dag:
from qiskit.dagcircuit.dagcircuit import DAGCircuit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was added to avoid the cyclic import complaints, right? In #12203 complaints I had too many of those and I opted for moving the synthesis imports in qiskit.circuit.library to import at runtime.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think moving the synthesis imports to runtime make the most sense. I wrote this for #12109 so I don't really remember the exact cause of the import cycle. But I wouldn't have used an import runtime unless I needed to avoid a cycle.

Comment on lines 672 to 680
elif name == "u3":
gate = U3Gate(*params)
circ.append(gate, qubits)
elif name == "u2":
gate = U2Gate(*params)
circ.append(gate, qubits)
elif name == "u1":
gate = U1Gate(*params)
circ.append(gate, qubits)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason why you didn't use GATE_NAME_MAP here too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was actually left over from #12109 . I've rebased the PR now and this isn't here anymore. TBH, I'm not sure why I didn't use the dict here too. I guess my thinking at the time was for the QuantumCircuit path we can use the gate methods on the QuantumCircuit class except for these 4 cases. But, also I was also likely trying to minimize the diff in #12109 and not really touch the circuit path. I can go back and follow up and deduplicate this in a separate PR if you'd like though.

This commit builds on Qiskit#12109 which added a dag output to the two qubit
decomposers that are then used by unitary synthesis to add a mode of
operation in unitary synthesis that avoids intermediate dag creation.
To do this efficiently this requires changing the UnitarySynthesis pass
to rebuild the DAG instead of doing a node substitution.
Copy link
Contributor

@ElePT ElePT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The on hold label can be removed now, right?

@mtreinish mtreinish removed the on hold Can not fix yet label Apr 30, 2024
@mtreinish
Copy link
Member Author

Yep, my bad it was leftover from when this was waiting for #12109 to merge.

@mtreinish mtreinish added this pull request to the merge queue Apr 30, 2024
Merged via the queue into Qiskit:main with commit 958cc9b Apr 30, 2024
13 checks passed
@mtreinish mtreinish deleted the no-dag-2q-synth branch April 30, 2024 17:10
ElePT pushed a commit to ElePT/qiskit that referenced this pull request May 31, 2024
)

This commit builds on Qiskit#12109 which added a dag output to the two qubit
decomposers that are then used by unitary synthesis to add a mode of
operation in unitary synthesis that avoids intermediate dag creation.
To do this efficiently this requires changing the UnitarySynthesis pass
to rebuild the DAG instead of doing a node substitution.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: None Do not include in changelog performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants